Unigram Backoff vs. TnT Evaluating Part of Speech Taggers Introduction to Computational Linguistics

نویسنده

  • Igor Boehm
چکیده

Automated statistical part-of-speech (POS) tagging has been a very active research area for many years and is the foundation of natural language processing systems. This paper introduces and analyzes the performance of two part-of-speech taggers, namely the NLTK unigram backoff tagger and the TnT tagger, a trigram tagger. Experimental results show that the TnT tagger outperforms the NLTK unigram backoff tagger. However, the tagging accuracy of both taggers decreases almost by the same number when the taggers are tested on a corpus which differs from the one they were trained on.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Tagging Accuracy Analysis on Part-of-Speech Taggers

Part of Speech (POS) Tagging can be applied by several tools and several programming languages. This work focuses on the Natural Language Toolkit (NLTK) library in the Python environment and the gold standard corpora installable. The corpora and tagging methods are analyzed and compared by using the Python language. Different taggers are analyzed according to their tagging accuracies with data ...

متن کامل

Amharic Part-of-Speech Tagger for Factored Language Modeling

This paper presents Amharic part of speech taggers developed for factored language modeling. Hidden Markov Model (HMM) and Support Vector Machine (SVM) based taggers have been trained using the TnT and SVMTool. The overall accuracy of the best performing TnTand SVM-based taggers is 82.99% and 85.50%, respectively. Generally, with respect to accuracy SVM-based taggers perform better than TnTbase...

متن کامل

The Open Source Tagger HunPoS for Swedish

HunPoS, a freely available open source part-of-speech tagger—a reimplementation of one of the best performing taggers, TnT—is applied to Swedish and evaluated when the tagger is trained on various sizes of training data. The tagger’s accuracy is compared to other data-driven taggers for Swedish. The results show that the tagging performance of HunPoS is as accurate as TnT and can be used effici...

متن کامل

Data-Driven Part-of-Speech Tagging of Kiswahili

In this paper we present experiments with data-driven part-of-speech taggers trained and evaluated on the annotated Helsinki Corpus of Swahili. Using four of the current state-of-the-art data-driven taggers, TnT, MBT, SVMTool and MXPOST, we observe the latter as being the most accurate tagger for the Kiswahili dataset.We further improve on the performance of the individual taggers by combining ...

متن کامل

Fast Domain Adaptation for Part of Speech Tagging for Dialogues

Part of speech tagging accuracy deteriorates severely when a tagger is used out of domain. We investigate a fast method for domain adaptation, which provides additional in-domain training data from an unannotated data set by applying POS taggers with different biases to the unannotated data set and then choosing the set of sentences on which the taggers agree. We show that we improve the accura...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005